feat: Add benchmarks#7
Merged
Merged
Conversation
… in load_tasks, compute ndcg5 in cache mode
…dowing, asserts, redundant slice
Add one-liner docstrings to all functions and methods across benchmarks/common.py, run_benchmark.py, and sync_repos.py. Remove the D ruff ignore for benchmarks/*.py so docstrings are enforced going forward. Also moves count_indexed_targets into run_benchmark.py (where Chunk is imported) to fix a pre-existing mypy Protocol error in the pre-commit env.
Full runs (no --repo/--language filters) automatically write results to benchmarks/results/<sha>.json, keyed by the 12-char git SHA. The file includes the full SHA, model name, per-repo rows, language aggregates, and overall summary. Cache mode writes <sha>-cache.json. Filtered runs are not saved.
Drop the --cache mode (cold vs warm build timing) — it was noisy and not actionable. Instead, add index_ms to RepoResult so every full run records index build time per repo alongside NDCG and query latency. index_ms is included in the saved JSON and printed in the summary table.
…, inline _output - benchmarks/common.py -> benchmarks/data.py (more descriptive name) - BENCH_ROOT: /tmp/bench -> ~/.cache/semble-bench (survives reboots) - Inline _output into _check_repo (single call site) - Update README to drop --cache docs and reflect new paths
Pringled
added a commit
that referenced
this pull request
Apr 17, 2026
… annotation audit - Fix n_relevant to use annotation count instead of index coverage (reviewer #5) - Add per-category NDCG@10 to printed summary and saved JSON (reviewer #7) - Replace 11 trivially-lexical semantic queries with vocabulary-diverse alternatives - Baseline: NDCG@10 = 0.825 (architecture=0.773, semantic=0.823, symbol=0.943)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.